Univariate model to find breeding values through regression with optional resampling techniques (Xavier et al. 2017) and polygenic term (Kernel). See "Details" for additional standalone functions written in C++.
wgr(y,X,it=1500,bi=500,th=1,bag=1,rp=FALSE,iv=FALSE,de=FALSE,
pi=0,df=5,R2=0.5,eigK=NULL,VarK=0.95,verb=FALSE)
The function wgr returns a list with expected value from the marker effect (\(b\)), probability of marker being in the model (\(d\)), regression coefficient (\(g\)), variance of each marker (\(Vb\)), the intercept (\(mu\)), the polygene (\(u\)) and polygenic variance (\(Vk\)), residual variance (\(Ve\)) and the fitted value (\(hat\)).
Numeric vector of observations (\(n\)) describing the trait to be analyzed. NA
is allowed.
Numeric matrix containing the genotypic data. A matrix with \(n\) rows of observations and (\(m\)) columns of molecular markers.
Integer. Number of iterations or samples to be generated.
Integer. Burn-in, the number of iterations or samples to be discarted.
Integer. Thinning parameter, used to save memory by storing only one every 'th' samples.
If different than one, it indicates the proportion of data to be subsampled in each Markov chain. For datasets with moderate number of observations, values of bag from 0.30 to 0.60 may speed up computation without losses in predicion properties. This argument enable users to enhance MCMC through subsampling (Xavier et al. 2017).
Logical. Use replacement for bootstrap samples when bag is different than one.
Logical. Assign markers independent variance, a T prior from a mixture of normals. If true, turns the default model BLUP into BayesA.
Logical. Assign markers independent variance through double-exponential prior. If true, turns the default model BLUP into Bayesian LASSO. This argument overides iv.
Value between 0 and 1. If greater than zero it activates variable selection, where markers have expected probability pi of having null effect.
Prior degrees of freedom of variance components.
Expected R2, used to calculate the prior shape.
Output of function eigen
. Spectral decomposition of the kernel used as a second random effect (eg. pedigree matrix).
Numeric between 0 and 1. For reduction of dimensionality. Indicates the proportion of variance explained by Eigenpairs used to fit second random effect.
Logical. If verbose is TRUE, function displays MCMC progress bar.
Alencar Xavier
The model for the whole-genome regression is as follows:
$$y = mu + Xb + u + e$$
where \(y\) is the response variable, \(mu\) is the intercept, \(X\) is the genotypic matrix, \(b\) is the regression coefficient or effect of an allele substitution, with \(d\) probability of being included into the model, \(u\) is the polygenic term if a kernel is used, and \(e\) is the residual term.
Users can obtain four WGR methods out of this function: BRR (pi=0,iv=F), BayesA (pi=0,iv=T), BayesB (pi=0.95,iv=T), BayesC (pi=0.95,iv=F) and Bayesian LASSO or BayesL (pi=0,de=T). Theoretical basis of each model is described by de los Campos et al. (2013).
Gibbs sampler that updates regression coefficients is adapted from GSRU algorithm (Legarra and Misztal 2008). The variable selection of functions \(wgr\), \(BayesB\) and \(BayesC\) works through the unconditional prior algorithm proposed by Kuo and Mallick (1998), whereas \(BayesCpi\) and \(BayesDpi\) are based on Metropolis-Hastings. Prior shape estimates are computed as Sb = R2*df*var(y)/MSx and Se = (1-R2)*df*var(y), with an exception for \(BayesC\) and \(BayesCpi\) where the prior shape is Sb = R2*df*var(y)/MSx/(1-pi). The polygenic term is solved by Bayesian algorithm of reproducing kernel Hilbert Spaces proposed by de los Campos et al. (2010).
In addition to wgr
, standalone C++ functions available include:
01) BayesA(y,X,it=1500,bi=500,df=5,R2=0.5)
02) BayesB(y,X,it=1500,bi=500,pi=0.95,df=5,R2=0.5)
03) BayesC(y,X,it=1500,bi=500,pi=0.95,df=5,R2=0.5)
04) BayesCpi(y,X,it=1500,bi=500,df=5,R2=0.5)
05) BayesDpi(y,X,it=1500,bi=500,df=5,R2=0.5)
06) BayesL(y,X,it=1500,bi=500,df=5,R2=0.5)
07) BayesRR(y,X,it=1500,bi=500,df=5,R2=0.5)
The implementations that support two random effects include:
08) BayesA2(y,X1,X2,it=1500,bi=500,df=5,R2=0.5)
09) BayesB2(y,X1,X2,it=1500,bi=500,pi=0.95,df=5,R2=0.5)
10) BayesRR2(y,X1,X2,it=1500,bi=500,df=5,R2=0.5)
And the cross-validation for the C++ implementations, with arguments analogous to emCV
.
mcmcCV(y,gen,k=5,n=5,it=1500,bi=500,pi=0.95,df=5,R2=0.5,avg=T,llo=NULL,tbv=NULL,ReturnGebv=FALSE)
de los Campos, G., Hickey, J. M., Pong-Wong, R., Daetwyler, H. D., and Calus, M. P. (2013). Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics, 193(2), 327-345.
de los Campos, G., Gianola, D., Rosa, G. J., Weigel, K. A., & Crossa, J. (2010). Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genetics Research, 92(04), 295-308.
Kuo, L., & Mallick, B. (1998). Variable selection for regression models. Sankhya: The Indian Journal of Statistics, Series B, 65-81.
Legarra, A., & Misztal, I. (2008). Technical note: Computing strategies in genome-wide selection. Journal of dairy science, 91(1), 360-366.
Xavier, A., Xu, S., Muir, W., & Rainey, K. M. (2017). Genomic prediction using subsampling. BMC bioinformatics, 18(1), 191.
if (FALSE) {
# Load data
data(tpod)
# BLUP
fit_BRR = wgr(y,gen,iv=FALSE,pi=0)
cor(y,fit_BRR$hat)
# BayesA
fit_BayesA = wgr(y,gen,iv=TRUE,pi=0)
cor(y,fit_BayesA$hat)
# BayesB
fit_BayesB = wgr(y,gen,iv=TRUE,pi=.95)
cor(y,fit_BayesB$hat)
# BayesC
fit_BayesC = wgr(y,gen,iv=FALSE,pi=.95)
cor(y,fit_BayesC$hat)
# BayesCpi
fit_BayesCpi = BayesCpi(y,gen)
cor(y,fit_BayesCpi$hat)
# BayesDpi
fit_BayesDpi = BayesDpi(y,gen)
cor(y,fit_BayesDpi$hat)
# BayesL
fit_BayesL = wgr(y,gen,de=TRUE)
cor(y,fit_BayesL$hat)
# Bagging BLUP
fit_Bag = wgr(y,gen,bag=0.5)
cor(y,fit_Bag$hat)
}
Run the code above in your browser using DataLab